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Abstract 

We present a series of CPS-based intermediate languages suitable 
for functional language compilation, arguing that they have practi¬ 
cal benefits over direct-style languages based on ,4-normal form 
(ANF) or monads. Inlining of functions demonstrates the bene¬ 
fits most clearly: in ANF-based languages, inlining involves a re¬ 
normalization step that rearranges let expressions and possibly in¬ 
troduces a new ‘join point’ function, and in monadic languages, 
commuting conversions must be applied; in contrast, inlining in our 
CPS language is a simple substitution of variables for variables. 

We present a contification transformation implemented by sim¬ 
ple rewrites on the intermediate language. Exceptions are modelled 
using so-called ‘double-barrelled’ CPS. Subtyping on exception 
constructors then gives a very straightforward effect analysis for ex¬ 
ceptions. We also show how a graph-based representation of CPS 
terms can be implemented extremely efficiently, with linear-time 
term simplification. 

Categories and Subject Descriptors D.3.4 [Programming Lan¬ 
guages ]: Processors - Compilers 

General Terms Languages 

Keywords Continuations, continuation passing style, monads, op¬ 
timizing compilation, functional programming languages 

1. Introduction 

Compiling with continuations is out of fashion. So report the au¬ 
thors of two classic papers on Continuation-Passing Style in recent 
retrospectives; 

“In 2002, then, CPS would appear to be a lesson aban¬ 
doned.” (McKinley 2004; Shivers 1988) 

“Yet, compiler writers abandoned CPS over the ten years 
following our paper anyway.” (McKinley 2004; Flanagan 
etal. 1993) 

This paper argues for a reprieve for CPS: “Compiler writers, give 
continuations a second chance.” 

This conclusion is borne of practical experience. In the MLj 
and SML.NET whole-program compilers for Standard ML, co¬ 
implemented by the current author, we adopted a direct-style, 
monadic intermediate language (Benton et al. 1998, 2004b). In 
part, we were interested in effect-based program transformations. 


for profit or commercial advantage and that copies bear this notice and the full citation 
on file first page. To copy otherwise, to republish, to post on servers or to redistribute 

ICFP’07, October 1-3, 2007, Freiburg, Germany. 

Copyright © 2007 ACM [This is the author’s version of the work. It is posted here by 
permission of ACM for your personal use. Not for redistribution.]... $5.00 


so monads were a natural choice for separating computations from 
values in both terms and types. But, given the history of CPS, prob¬ 
ably there was also a feeling that “CPS is for call/cc”, something 
that is not a feature of Standard ML. 

Recently, the author has re-implemented all stages of the 
SML.NET compiler pipeline to use a CPS-based intermediate lan¬ 
guage. Such a change was not undertaken lightly, amounting to 
roughly 25,000 lines of replaced or new code. There are many 
benefits: the language is smaller and more uniform, simplifica¬ 
tion of terms is more straightforward and extremely efficient, and 
advanced optimizations such as contification are more easily ex¬ 
pressed. We use CPS only because it is a good place to do opti¬ 
mization; we are not interested in first-class control in the source 
I ng ge (call/cc), or as a means of implementing other features 
such as concurrency. Indeed, as SML.NET targets .NET IL, a call¬ 
stack-based intermediate language with support for structured ex¬ 
ception handling, the compilation process can be summarized as 
“transform direct style (SML) into CPS; optimize CPS; transform 
CPS back to direct style (.NET IL)”. 

1.1 Some history 

CPS. What’s special about CPS? As Appel (1992, p2) put it, 
“Continuation-passing style is a program notation that makes ev¬ 
ery aspect of control flow and data flow explicit”. An important 
consequence is that full /3-reduction (function inlining) is sound. 
In contrast, for call-by-value languages based on the lambda cal¬ 
culus, only the weaker /3-value rule is sound. For example, /3- 
reduction cannot be applied to (Aai.O) (/ y) because f y may 
have a side-effect or fail to terminate; but its CPS transform, 
/ y (\z.(Ax.Ak.k 0) z k) can be reduced without prejudice. 
There are obvious drawbacks: the complexity of CPS terms; the 
need to eliminate administrative redexes introduced by the CPS 
transformation; and the cost of allocating closures for lambdas in¬ 
troduced by the CPS transformation, unless some static anlysis is 
first applied. In fact, these drawbacks are more apparent than real: 
the complexity of CPS terms is really a benefit, assigning use¬ 
ful names to all intermediate computations and control points; the 
CPS transformation can be combined with administrative reduc¬ 
tion; and by employing a syntactic separation of continuation- and 
source-lambdas it is possible to generate good code directly from 
CPS terms. 

ANF. In their influential paper “The Essence of Compiling with 
Continuations”, Flanagan et al. (1993) observed that “fully devel¬ 
oped CPS compilers do not need to employ the CPS transformation 
but can achieve the same results with a simple source-level transfor¬ 
mation”. They proposed a direct-style intermediate language based 
on ,4-normal forms, in which a let construct assigns names to every 
intermediate computation. For example, the term above is repre¬ 
sented as let z = f y in (\x.0) z, to which /3-reduction can be ap¬ 
plied, obtaining the semantically equivalent let 2 = / y in 0. This 
style of language has become commonplace, not only in compilers. 



but also to simplify the study of semantics for impure functional 
languages (Pitts 2005, §7.4). 

Monads. Very similar to ANF are so-called monadic languages 
based on Moggi’s computational lambda calculus (Moggi 1991). 
Monads also make sequencing of computations explicit through a 
let x <= M in N binding construct, the main difference from ANF 
being that let constructs can themselves be let-bound. The sepa¬ 
ration of computations from values also provides a place to hang 
effect annotations (Wadler and Thiemann 1998) which compilers 
can use to perform effect-based optimizing transformations (Ben¬ 
ton et al. 1998). 

1.2 The problem 

.4-Normal Form is put forward as a compiler intermediate language 
with all the benefits of CPS (Flanagan et al. 1993, §6). Unfor¬ 
tunately, the normal form is not preserved under useful compiler 
transformations such as function inlining (/3-reduction). Consider 
the ANF term 

M = let x = (Aj/.let z = a b in c) d in e. 

Now naive /3-reduction produces 

let x = (let a = a b in c) in e 

which is not in normal form. The ‘fix’ is to define a more complex 
notion of /3-reduction that re-normalizes let constructs (Sabry and 
Wadler 1997), in this case producing the normal form 
let z = a b in (let x = c in e). 

In contrast, the CPS transform of M, namely 

{Xy.Xk.ab(Xz.k c)) d {Xx.k e ), 
simplifies by simple /3-reduction to 

a b (Xz.(Xx.k e) c). 

As Sabry and Wadler explain in their study of the relationship be¬ 
tween CPS and monadic languages, “the CPS language achieves 
this normalization using the metaoperation of substitution which 
traverses the CPS term to locate k and replace it by the contin¬ 
uation thus effectively ‘pushing’ the continuation deep inside the 
term” (Sabry and Wadler 1997, § 8). 

Monadic languages permit let expressions to be nested, but 
incorporate so-called commuting conversions (cc’s) such as 
let y <= (let x <S= M in N) in P 
-► let x <= Min (let y <= Win P). 

ANF can be seen as a monadic language in which /3-reduction is 
combined with cc-normalization ensuring that terms remain in cc- 
normal form. 

All of the above seems quite benign; except for two things: 

1. Commuting conversions increase the complexity of simplifying 
intermediate language terms. Reductions that strictly decrease 
the size of the term can be applied exhaustively on CPS terms, 
the number of reductions applied being linear in the size of the 
term. The equivalent ANF or monadic reductions must neces¬ 
sarily involve commuting conversions, which leads to 0(n 1 2 ) 
reductions in the worst case. Moreover, as Appel and Jim (1997) 
have shown, given a suitable term representation, shrinking re¬ 
ductions on CPS can be applied in time O (n); it is far from clear 
how to amortize the cost of commuting conversions to obtain a 
similar measure for ANF or monadic simplification. 

2. Real programming languages include conditional expressions, 
or, more generally, case analysis on datatype constructors. 
These add considerable complexity to reductions on ANF or 


monadic terms. Consider the term 

let a = (Aa:.if x then a else 6) c in M 
This is in ANF, but /3-reduction produces 

let z = (if c then a else 6) in M, 
which is not in normal form because it contains a let-bound 
conditional expression. To reduce it to normal form, one must 
either apply a standard commuting conversion that duplicates 
the term M, producing 

if c then let z = a in M else let z = b in M, 
or introduce a ‘join-point’ function for term M, to give 
let kz = M 

in if c then let z = a in k z else let z = b in k z. 
Observe that k is simply a continuation! In our CPS language, 
k is already available in the original term, being the (named) 
continuation that is passed to the function to be inlined. The de¬ 
sire to share subterms almost forces some kind of continuation 
construct into the language. Better to start off with a language 
that makes continuations explicit. 

1.3 Contribution 

Much of the above has been said before by others, though not al¬ 
ways in the context of compilation; in this author’s opinion, the 
most illuminating works are Appel (1992); Danvy and Filinski 
(1992); Hatcliff and Danvy (1994); Sabry and Wadler (1997). One 
contribution of this paper, then, is to draw together these observa¬ 
tions in a form accessible to implementers of functional languages. 

As is often the case, the devil is in the details, and so another 
purpose of this paper is to advocate a certain style of CPS that 
works very smoothly for compilation. Continuations are named and 
mandatory (just as every intermediate value is named, so is every 
control point), are second-class (they’re not general lambdas), can 
represent basic blocks and loops, can be shared (typically, through 
common continuations of branches), represent exceptional control 
flow (using double-barrelled CPS), and are typeable (but can be 
used in untyped form too). By refining the types of exception 
values in the double-barrelled variant we get an effect system for 
exceptions ‘for free’. 

We make two additional contributions. Following Appel and 
Jim (1997), we describe a graph-based representation of CPS terms 
that supports the application of shrinking /3-reductions in time lin¬ 
ear in the size of the term. We improve on Appel and Jim’s selec¬ 
tive use of back pointers for accessing variable binders, and em¬ 
ploy the union-find data structure to give amortized near-constant¬ 
time access to binders for all variable occurrences. This leads to ef¬ 
ficient implementation of ^-reductions and other transformations. 
We present benchmark results comparing our graph-CPS represen¬ 
tation with (a) an earlier graphical representation of the original 
monadic language used in our compiler, and (b) the original func¬ 
tional representation of that language. 

Lastly, we show how to transform functions into local continu¬ 
ations using simple term rewriting rules. This approach to contif- 
ication avoids the need for a global dominator analysis (Fluet and 
Weeks 2001), and furthermore supports nested and first-class func- 

2. Untyped CPS 

We start by defining an untyped continuation-passing language 
Acps that supports non-recursive functions, the unit value, pairs, 
and tagged values. Even for such a simple language, we can cover 
many of the issues and demonstrate advantages over alternative, 
direct-style languages. 





Figure 1. Syntax and scoping rules for untyped language A^ps 


In Section 3, we add recursive functions, types, polymorphism, 
exceptions, and effect annotations. At that point, the language re¬ 
sembles a practical CPS-based intermediate language of the sort 
that could form the core of a compiler for SML, Caml, or Scheme. 

Figure 1 presents the syntax of the untyped language. Here or¬ 
dinary variables are ranged over by x, y, /, and g, and continuation 
variables are ranged over by k and j. Indices i range over 1,2. We 
specify scoping of variables using well-formedness rules for values 
and terms. Here r b V ok means that value V is well-formed in 
the scope of a list of ordinary variables F, and T; A h K ok means 
that term K is well-formed in the scope of a list of continuation 
variables A and a list of ordinary variables T. Complete programs 
are well-formed in the context of a distinguished top-level contin¬ 
uation halt. (For the typed variant of our language there will be 
typing rules with F and A generalized to typing contexts.) 

We describe the constructs of the language in turn. 

• The expression letval x = V in K binds a value V to a 
variable x in the term K. This is the only way a value V 
can be used in a term; arguments to functions, case scrutinees, 
components of pairs, and so on, must all be simple variables. 
Even the unit value () must be bound to a variable before being 
used (in the full language, the same holds even for constants 
such as 42). This means that there is no need for a general 
notion of substitution: we only substitute variables for variables. 
Notice also that there is no notion of redundant binding such as 
let x <=y in K. 


• The expression let x = n t y in K projects the Pth component 
of a pair y and binds it to variable x in K. 

• The expression letcont k x = K in L introduces a local 
continuation k whose single argument is x and whose body 
is K, to be used in term L. It corresponds to a labelled block in 
traditional lower-level representations. In Section 3 we extend 
local continuations with support for recursion, and so represent 
loops directly. 

• A continuation application k x corresponds to a jump (if k is a 
local continuation) or a return (if k is the return continuation 
of a function value). As with values, continuations must be 
named: function application expressions and case constructs 
do not have subterms, but instead mention continuations by 
name. We need only ever substitute continuation variables for 
continuation variables. 

Local continuations can be applied more than once, as in 
letcont j y — K in 

letcont fci xi = (letval x = Vi in j x) in 
letcont k2 X2 = (letval x = V2 in j x) in 
case z of fci [] 

Here j is the common continuation, or ‘join point’ for branches 
ki and k2- 

• The expression / k x is the application of a function / to an 
argument x and a continuation k whose parameter receives the 
result of applying the function. If k is the return continuation 
for the nearest enclosing A, then the application is a ‘tail call’. 
For example, consider the function value 

Xk x. (letcont j y = g k y in / j x). 

Here g is in tail position, and / is not. In effect, we are defining 
A x.g(f{x)). 

• The construct case x of k\ [] ki expects x to be bound to 
a tagged value in; y and then dispatches to the appropriate 
continuation ki, passing y as argument. 

• Values include the unit value (), pairs (x, y) and tagged values 
in; a;. Function values A kx.K include a return continuation k 
and argument x. Note carefully the well-formedness rule (abs): 
its continuation context includes only the return continuation k, 
thus enforcing locality of continuations introduced by letcont. 

The semantics is given by environment-style evaluation rules, 
presented in Figure 2. As is conventional, we define a syntax of 
run-time values, ranged over by r, supporting the unit value, pairs, 
constructor applications, and closures. Environments map variables 
to run-time values, and continuation variables to continuation val¬ 
ues. Continuation values are represented in a closure form, which 
gives the impression that they are first-class. An alternative would 
be to model stack frames more directly and thereby demonstrate 
that continuations are in fact just code pointers. For the purpose 
of simply defining the meaning of programs we prefer the closure- 
based semantics. 

The function [•] p interprets a value expression in an environ¬ 
ment p. Terms are evaluated in an environment p; the only obser¬ 
vations that we can make of programs are termination, i.e. the ap¬ 
plication of the top-level continuation halt to a unit value. 


2.1 CPS transformation 

To illustrate how the CPS-based language can be used for func¬ 
tional language compilation, consider a fragment of Standard ML 





Figure 2. Evaluation rules for A^ps 



Figure 3. Naive CPS transformation of toy ML into A^ps 


whose expressions (ranged over by e) have the following syntax: 
ML B e ::= x \ e e' | fn x => e | (e.<$ j *i e | 0 
| ini e | let val x = e in e' end 
| case e of ini a:i => ei I in2 X2 => e.i 
We assume a datatype declared by 

datatype (’a,’b) sum = ini of ’a I in2 of ’b 
Expressions in this language can be translated into untyped CPS 
terms using the function shown in Figure 3. This is an adapta¬ 


tion of the standard higher-order one-pass call-by-value transfor¬ 
mation (Danvy and Filinski 1992). An alternative, first-order, trans¬ 
formation is described by Danvy and Nielsen (2003). 

The transformation works by taking a translation-time func¬ 
tion k as argument, representing the ‘context’ into which the trans¬ 
lation of the source term is embedded. For our language, the con¬ 
text’s argument is a variable, as all intermediate results are named. 
Note some conventions used in Figure 3: translation-time lambda 
abstraction is written using A and translation-time application is 
written k(. ..), to distinguish from A and juxtaposition used to de¬ 
note lambda abstraction and application in the target language. Also 
note that any object variables present in the target terms but not in 
the source are assumed fresh with respect to all other bound vari- 

The translation is one-pass in the sense that it introduces no 
‘administrative reductions’ (here, /3-redexes for continuations) that 
must be removed in a separate phase, except for let constructs (to 
avoid these also would require analysis of the let expression; we 
prefer to apply simplifying rewrites on the output of the transfor¬ 
mation). However, the translation is naive in two ways. First, it in¬ 
troduces rj-redexes for continuations when translating tail function 
applications. For example, [fn x => / (cc ,y) ] k produces 

letval g = Afcx.(letval p = (x,y) in letcont j z = k z in / j p) 

in K {g) 

whose 77-redex (highlighted) can be eliminated to obtain the more 
compact 

letval g = (A k x. letval p = ( x , y) in f k p) in k( g). 

Second, the translation of case duplicates the context; consider, 
for example, /(case x of ini xi => ei I in2 X2 => e.2) whose 
translation involves two calls to /. 

The more sophisticated translation scheme of Figure 4 avoids 
both these problems; again, this is based on Danvy and Filinski 
(1992). The translation function [•] is as before, except (a) it in¬ 
troduces a join point continuation to avoid context duplication for 
case, and (b) for terms in tail position it uses an alternative trans¬ 
lation function d |) that takes an explicit continuation variable as 
argument instead of a context. 

2.2 Rewrites 

After translating from source language to intermediate language, 
most functional language compilers perform a number of optimiza¬ 
tion phases that are implemented as transformations on intermedi¬ 
ate language terms. Some phases are specific (for example, arity- 
raising of functions, or hoisting expressions out of loops) but usu¬ 
ally there is some set of general rewrites based on standard re¬ 
ductions in the lambda-calculus. Figure 5 presents some general 
rewrites for our CPS-based language. The rewrites look more com¬ 
plicated than the equivalent reductions in the lambda-calculus be¬ 
cause the naming of intermediate values forces introduction and 
elimination forms apart. For example, /3-reduction on pairs, which 
in the lambda calculus is simply 7r» (ei, e2) —> ei, has to support 
an intervening context C. In practice, the rewrites are not hard to im¬ 
plement. In functional style, value bindings (e.g. pairs) are stored in 
an environment which is accessed at the reduction site (e.g. a pro¬ 
jection). In imperative style, bindings are accessed directly through 
pointers, as we shall see in Section 4.1. 

The payoff from this style of rewrite is the selective use of /3 
rules. For example, in a lambda-calculus extended with a let con¬ 
struct, one might perform the reduction let p = (x, y) in M —> 
M[(x,y)/p\ but this would be undesirable unless every substi¬ 
tution of (x,y) for p in M produced a redex. In our language, 
letval p = (x,y) in ... k p.. .\et z = mp in K reduces to 
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Figure 4. Tail CPS transformation (changes and additions only shown) 


C ::= 0 | letval x = V in C | let x = n y in C \ 

letval x = \kx.C in K \ letcont k x = C in K \ 
letcont k x = K in C 

Dead-Cont letcont k x = L in K —> L {k not free in K) 
Dead-Val letval x = V in K —► K (x not free in K ) 

/3-Cont 

letcont k x = K in C[k y] 

letcont k x = K in C[K[y/x\] 

/3-Fun 

letval / = Xkx.K in C[f j y] 

-> letval / = Xkx.K in C[K[y/x,j/k}] 

/3-Case 

letval x = in» y in C[case * of ki [] k 2 \ 

—» letval x = in; y in C[k t y \ 

/3-Pair 

letval x = (xi,x 2 ) in C[let y = m x in K] 

-X letval X = (x!,x 2 ) in C[K[xi/y]] 

/3-Cont-Lin letcont k x = K in C[k y] 


— C[K[y/x]] (if k not free in C) 

/3- Fun-Lin letval / = Xkx.K in C[f j y] 


- C[K[y/x,j/k}} {f^y,f not free in C) 

?7-CONT 

letcont k x = j x in K —► K\j/k\ 

?7-Fun 

letval / = Xkx.g k x in K -► K[g/f] 

?7-Pair 

let Xi = iti x in C[let % = 7r, x 
in C'[letval y = (xi,x 2 ) in K]] 

—> let Xi = mx in C[let Xj = ttj x 


in C'[K[x/y]]] ({Lj}-{\,2}) 

?7-Case 

letcont ki xi = (letval yi = injiri in k yi) in 
C[letcont kj x 2 = (letval y 2 = in j x 2 in k y 2 ) in 
C'[case x of fci 0 k 2 }] 

— > letcont ki xi = (letval yi = inixi in k yi) in 


C[letcont kj x 2 = (letval y 2 = \nj x 2 in k y 2 ) in 
C'[kx]\ = {1,2}) 


Figure 5. General rewrites for Acps 


letval p = (a :,y) in .. .k p... K[x/z\ which applies the /3 -Pair 
rule to 7ri p but preserves other occurrences of p. 

It is easy to show that all rewrites preserve well-formedness of 
terms. In particular, the scoping of local continuations is respected. 

The /3-Fun and /3-Cont reductions are Mining transforma¬ 
tions for functions and continuations. The remainder of the reduc¬ 
tions we call shrinking reductions, as they strictly decrease the size 
of terms (Appel and Jim 1997). The /3-Cont-Lin and /3-Fun-Lin 
reductions are special cases of /3-reduction for linear uses of a vari¬ 
able, in effect combining Dead- and /3- reductions. Shrinking re¬ 
ductions can be applied exhaustively on a term, and are typically 
used to ‘clean up’ a term after some special-purpose global trans¬ 
formation such as arity-raising or monomorphisation. Clearly the 
number of such reductions will be linear in the size of the term; 
moreover, using the representation of terms described in Section 4 
it is possible to perform such reductions in linear time. 

2.3 Comparison with a monadic language 

The original implementations of the MLj and SML.NET compil¬ 
ers used monadic languages inspired by Moggi’s computational 
lambda calculus (Moggi 1991). Figure 6 presents syntax for a 
monadic language A m on and selected reduction rules. 

The defining feature of monadic languages is that sequencing 
of computations is made explicit through the let construct; val¬ 
ues are converted into trivial computations using the val construct. 
Monadic languages share with CPS languages the property that fa¬ 
miliar /3-reduction on functions is sound, as evaluation of the func¬ 
tion argument is made explicit through let. But there are drawbacks, 
as we outlined in the Introduction. (An orthogonal issue - as for 
CPS based languages - is whether values can appear anywhere ex¬ 
cept inside val. In Amon, for ease of presentation, we permit values 
to be embedded in applications, pairs, and so on, whereas for Aq> s 
we insist that they are named. The difference shows up in the re¬ 
duction rules, which in A^ps make use of contexts. It should be 
noted that the drawbacks of monadic languages that we are about 
to discuss are unaffected by this choice.) 

Problem 1: need for let/let commuting conversion. The basic 
reductions listed in Figure 5 have corresponding reductions in CPS. 
The let construct itself has /3 and r/ rules which correspond to 
/3-Cont and r/-CONT for Acps (consider the CPS transforms of 
the terms). In contrast to CPS-based languages, though, monadic 






Grammar 

MTm B M,N ::= val v | let x <^= M in JV | v w \ m v 
| case v of ini an-Mi [] in 2 x 2 .M 2 
MVal B v,w ::= x \ Xx.M \ (v, w) | in< v | () 

Reductions 

/3- Let let x <= val v in M -► M[v/x\ 

?j-Let let a : <= M in val x —* M 

CC-Let let x 2 <= (let xi <S= Mi in M 2 ) in N 

—> let xi <= Mi in (let x 2 <= M 2 in N) 
CC-Case let x -f= (case v of ini xi.Mi [] in 2 x 2 .M 2 ) in I 
-> let / val Xx.N 

in case v of ini an-let x <= Mi in f x 
[] in 2 a?2-let x •<= M 2 in f x 


/3-Pair tn(v^va) ->■ v t 
/3-Fun (Xx.M)v -> M[v/x\ 

/3-Case case iniv of ini xi.Mi [] in 2 x 2 .M 2 —> Mi[v/xi] 


Figure 6. Syntax and selected rewrites for monadic language Amon 


languages include a so-called commuting conversion, expressing 
associativity for let: 


Problem 2: quadratic blowup. The CC-Let reduction seems in¬ 
nocent enough. But observe that it is not a shrinking reduction - so 
it’s not immediately clear whether reduction will terminate. Fortu¬ 
nately, the combination of CC-Let and shrinking /3/r/-reductions 
of Figure 6 does terminate (Lindley 2005), and moreover there is 
a formal correspondence between the reductions of the monadic 
language and CPS (Hatcliff and Danvy 1994). Unfortunately, the 
order in which conversions are applied is critical to the efficiency 
of simplification by reduction. Consider the following term in Amon: 

let f„ <= val (Aa: n .let y n <= g x„ in g y n ) in 

let /„_i <ja val (A*„_i.let y n -1 <= fn x n -i in g j/„_i) in 

let fi?& val (Aan.let yi 4?' / 2 xi in g yi) in/i a 

If (linear) /3-FUN is applied to all functions in this term, followed 
by a sequence of CC-Let reductions, then no redexes remain 
after O(n) reductions. If, however, the commuting conversions 
are interleaved with /3 -Fun, then 0(n 2 ) reductions are required. 
(There are other examples where it is better to apply commuting 
conversions first.) Although this is a pathological example, the 
‘simplifier’ was a major bottleneck in the MLj and SML.NET 
compilers (Benton et al. 2004a), in part (we believe) because of 
the need to perform commuting conversions. 


CC-Let let x 2 4= (let xi 4= Mi in M 2 ) in N 

—> let xi 4= Mi in (let x 2 4= M 2 in N) 

This reduction plays a vital role in exposing further reductions. 
Consider the source expression 

#1 ((fn x => (.g x,x)) y) 

Its translation into A m0 n is 

let 22 4= (A*.let zi 4= g x in val («i, x)) y in iti z 2 . 

Now suppose that we apply /3 -Fun, to get 

let z 2 4» (let zi 4= g y in val (zi, y)) in in z 2 . 

In order to make any further progress, we must use CC-Let to get 
let zi 4= g y in let 22 4= val (zi, y) in m z 2 . 


Solution 2: Use CPS. It is interesting to note that monadic terms 
can be translated into CPS in linear-time; shrinking reductions can 
be applied exhaustively there in linear-time (see Section 4); and the 
term can be translated back into CPS in linear-time. Therefore the 
quadratic blowup we saw above is not fundamental, and there may 
be some means of amortizing the cost of commuting conversions 
so that exhaustive reductions can be peformed in linear time. Nev¬ 
ertheless, it is surely better to have the term in CPS from the start, 
and enjoy the benefit of linear-time simplification. 


Problem 3: need for let/case commuting conversion. Matters 
become more complicated with conditionals or case constructs. 
Consider the source expression 

(/(<?((fn x => case x of ini xi => tei ,£3) I in2 a: 2 => g" x ) //)) 


Now we can apply /3 -Let and /3 -Pair to get let zi <= g y in zi 
which further reduces by r?-LET to g y. 

Solution 1: Use CPS. Now take the original source expression 
and translate it into our CPS-based language, with k representing 
the enclosing continuation, 
let / = Xji x. 

(letcont j 2 zi = (letval 22 = (zi, *) in ji z 2 ) in g j 2 x) 
in letcont y'3 Z3 = (let 24 = tti 23 in k 24) 
in / j 3 y 

Applying /3-FUN-LlN gives the following, with substitutions high¬ 
lighted: 

letcont j 3 23 = (let 24 = 7 ri 23 in k 24) 
in letcont j 2 21 = (letval 2 2 = (zi, g|) in | j 3 z 2 ) in g j 2 y 
and by /3 -Cont-Lin on j 3 we get 
letcont j 2 21 = 

(letval z 2 = (zi,y) in let 24 = iti z 2 in k 24) 
in g ji y. 

Finally, use of /3 -Pair and Dead-Val produces letcont j 2 zi = 
k zi in g j 2 y which reduces by 77-CONT to g k y. All reductions 
were simple uses of /3 and r/ rules, without the need for the addi¬ 
tional ‘administrative’ reduction CC-Let. 


Its translation into A m on is 

let 2 (Aa:.case x of ini cci.val (xi, x 3 ) [] in 2 x 2 .g" x) y in 
This reduces by /3 -Fun to 

let 2 -f= (case y of ini zi.val (an, *3) [] in 2 x 2 .g" y) in 
let z' <= g z in g' z'. 

At this point, we want to ‘float’ the case expression out of the let. 
The proof-theoretic commuting conversion that expresses this 
rewrite is 

let x -f= (case v of ini an.Mi J in 2 x 2 .M 2 ) in N 

case v of ini an.(let x <= Mi in TV) [] in 2 a: 2 .(let x <= M 2 in N) 

This can have the effect of exposing more redexes; unfortunately, 
it also duplicates N which is not so desirable. So instead, compil¬ 
ers typically adopt a variation of this commuting conversion that 
shares M between the branches, creating a so-called join point 
function: 

CC-Case let x <= (case v of ini an-Mi [] in 2 x 2 .M 2 ) in N 
—> let / -f= val Xx.N 

in case v of ini an.let x <= Mi in f x 
0 in2 an-let x <= M 2 in / x 





Applying this to our example produces the result 

let / -t= val (Az.let z' <= g z in g' z') in 

case x of 

ini *i.(let z 4= val (xi,x 3 ) in / z) 

0 ini *2-(let z <S= g" x in / z). 

As observed earlier, join points such as / are just continuations. 

Solution 3: Use CPS. Consider the CPS transformation of the 
original source expression, with k being the enclosing return con¬ 
tinuation. 

letcont j' z' = g' k z' in 
letcont j z = g j' z in 
letval / = A j" x. 

(letcont ki f£ = (letval z" = (£1,0:3) in j" z") in 
letcont k 2 X 2 = g" j" x in 
case x of fci [] k 2 ) 

if f j v 

Applying /3 -Fun-Lin immediately produces the following term, 
with substitutions highlighted: 

letcont j' z' = g' k z' in 
letcont j z = g j' z in 

letcont ki xi = (letval z" = (*i, x 3 ) in j z") in 
letcont k 2 x 2 = g" j y in 
case y of k\ [] k 2 

There is no need to apply anything analogous to CC-Case, or to 
introduce a join point: the original term already had one, namely j, 
which was substituted for the return continuation j" of the function. 

The absence of explicit join points in monadic languages is 
an annoyance in itself. By representing join points as ordinary 
functions, it is necessary to perform a separate static analysis to 
determine that such functions can be compiled efficiently as basic 
blocks. 

Explicitly named local continuations in CPS have the advantage 
that locality is immediate from the syntax, and preserved under 
transformation; furthermore traditional intra-procedural compiler 
optimizations (such as those performed on SSA representations) 
can be adapted to operate on functions in CPS form. 

2.4 Comparison with ANF 

Flanagan et al. (1993) propose an alternative to CPS which they call 
,4-Normal Form, or ANF for short. This is defined as the image 
of the composition of the CPS, administrative normalization and 
inverse CPS transformations. 


Instead of going via a CPS language, the transformation into ANF 
can be performed in one pass, as suggested by the dotted line A in 
the diagram above. 1 A similar transformation has been studied by 
Danvy (2003). 

As Flanagan et al. (1993) suggest, the “back end of an ,4-normal 
form compiler can employ the same code generation techniques 
that a CPS compiler uses”. Flowever, as we mentioned in the In¬ 
troduction, it is not so apparent whether ANF is ideally suited to 
optimization. After all, it is not even closed under the usual rule 
for /3 reduction (A x.A) v —> A [v/x]. As Sabry and Wadler 
(1997) later explained, it is necessary to combine substitution with 
re-normalization to get a sound rule for /3-reduction: essentially the 
repeated application of CC-Let. They do not consider conditionals 
or case constructs, but presumably to maintain terms in ANF in it 
is necessary to normalize with respect to CC-Let and CC-Case 
following function inlining. 

It is clear, then, that ANF suffers all the same problems that af¬ 
fect monadic languages: the need for (non-shrinking) commuting 
conversions, quadratic blowup of ‘linear’ reductions, and the ab¬ 
sence of explicit join points. 

3. Typed CPS with exceptions 

We now add types and other features to the language of Section 2. 
In the untyped world, we can model recursion using a call-by-value 
fixed-point combinator. For a typed language, we must add ex¬ 
plicit support for recursive functions - which, in any case, is more 
practical. Moreover, we would like to express recursive continu¬ 
ations too, in order to represent loops. Finally, to support excep¬ 
tions, functions in the extended language take two continuations: 
an exception-handler continuation, and a return continuation. This 
is the so-called double-barrelled continuation-passing style (Thi- 
elecke 2002). 

Figure 7 presents the syntax and typing rules for the extended 
language Xq PS . Types of values are ranged over by t, a and include 
unit, a type of exceptions, products, sums and functions. (To save 
space, we omit constructs for manipulating exception values.) Con¬ 
tinuation types have the form ->r which is interpreted as ‘continua¬ 
tions accepting values of type r\ Note that for simplicity of presen¬ 
tation we do not annotate terms with types; it is an easy exercise to 
add sufficient annotations to determine unique typing derivations. 
Typing judgments for values have the form r h V : r in which T 
maps variables to value types. Judgments for terms have the form 
T; A h K ok in which the additional context A maps continua¬ 
tion variables to continuation types. Complete programs are typed 
in the context of a single top-level continuation halt accepting unit 

We consider each construct in turn. 


CS 

A(CS) 



The source language CS is Core Scheme (corresponding to our 
fragment of ML), and their CPS transformation composed with /3- 
normalization is equivalent to our one-pass transformation [•] of 
Figure 4. 

The language A(CS) corresponds precisely to CC-Let/CC- 
Case normal forms in A m0 n. We can express these normal forms 


by a grammar: 

ATm 3 A, B 

::= R | let a: <^= R in A 

ACmp 3 R 

case v of ini x\.A\ [] in 2 x 2 .A 2 
"= vw\*iv\v 

AVal 3v,w 

::= x | \x.A | (v, w) | irq v | () 


The letval construct is as before, with the obvious typing rule 
and associated value typing rules. Likewise for projections. 

The letcont construct is generalized to support mutually recur¬ 
sive continuations. These represent loops directly. Local con¬ 
tinuations are also used for exception handlers. 

The letfun construct introduces a set of mutually recursive 
functions; each function takes a return continuation k, an excep¬ 
tion handler continuation h, and an argument x. As a language 
construct, there is nothing special about the handler continua¬ 
tion except that its type is fixed to be -iexn, and so a function 
type t —> a is constructed from the argument type r and the 
type -i<r of the return continuation. What really distinguishes 


1 Though, curiously, the ‘A-normalization algorithm’ in (Flanagan et al. 
1993, Fig. 9) does not actually normalize terms, as it leaves let-bound 
conditionals alone. 






exceptions is (a) their role in the translation from source lan¬ 
guage into CPS, and (b) typical strategies for generating code. 

• Continuation application k x is as before. Now there are four 
possibilities for k: it may be a recursive or non-recursive occur¬ 
rence of a letcont-bound continuation, compiled as a jump, it 
may be the return continuation, or it may be a handler continu¬ 
ation, which is interpreted as raising an exception. 

• Function application f k h x includes a handler continua¬ 
tion argument h. If k is the return continuation for the near¬ 
est enclosing function, and h is its handler continuation, then 
the application is a tail call. If k is a local continuation and h 
is the handler continuation for the enclosing function, then 
the application is a non-tail call without an explicit excep¬ 
tion handler - so exceptions are propagated to the context. 
Otherwise, h is an explicit handler for exceptions raised by 
the function. (Other combinations are possible; for example in 
letfun f khx = C\g hhy] in K the function application is 
essentially raise (g y) in a tail position.) 

• Branching using case is as before. 

3.1 CPS transformation 

We can extend the fragment of ML described in Section 2.1 with 
exceptions and recursive functions: 

ML B e ::= ... | raise e | ei handle x => ei 

| let fun dine end 
MLDef Bd f x = e 

The revised CPS transformation is shown in Figure 8 (see (Kim 
et al. 1998) for the selective use of a double-barrelled CPS trans¬ 
formation). Both [•] and (-|) take an additional argument; a contin¬ 
uation h for the exception handler in scope. Then raise e is trans¬ 
lated as an application of h. For ei handle x => ei a local handler 


continuation h! is declared whose body is the translation of e 2 \ this 
is then used as the handler passed to the translation function for ei. 

3.2 Rewrites 

The rewrites of Figure 5 can be adapted easily to Aj PS , and extended 
with transformations such as ‘loop unrolling’: 

/3-Rec letfun /i fci hi xi = C[fi k h x] 

f2 k2 h2 X2 = K 2 
... fnk n h„Xn = K n 

in K 

-► letfu n fikihiXi=C[Ki [fc/fc* h h„ x As.1} 

/2 k2 h2 X2 = K 2 
... f n k n h n X n = K n 

in K 

/3-RecCont letcont fci xi = C[ki x] 
k 2 X 2 = K 2 
... k n x„=K n 

in K 

—> letcont hi xi = C\Ki\x/i$g^. 

k2 X2 = K 2 
... k„X n =K n 

in K 

There are no special rewrites for exception handling, e.g. corre¬ 
sponding to (raise M) handle x.N —> let x <= M in N. Stan¬ 
dard /3-reduction on functions and continuations gives us this for 
free. For example, the CPS transform of 

let fun f x = raise x in / y handle z => ( z,z ) end 

letfun f k! h! x = b! x 

in letcont j z = (letval z' = (z, z) in k z') in / k j y 
which reduces by /3-Fun and /3-Cont to letvaI z' = (y, y) in k z'. 








Likewise, commuting conversions are not required, in contrast 
with monadic languages, where in order to define well-behaved 
conversions it is necessary to generalize the usual M handle x =4- 
N construct to try y <= M in Ni unless x =4- Ifs, incorporating a 
success ‘continuation’ iVi (Benton and Kennedy 2001). 


3.3 Other features 

It is straightforward to extend Ajps with other features useful for 
compiling full-scale programming languages such as Standard ML. 

• Recursive types of the form pa.r can be supported by adding 
suitable introduction and elimination constructs: a value fold x 
and a term let x = unfold y i nK. 

• Binary products and sums generalize to the n-ary case. For opti¬ 
mizing representations it is common for intermediate languages 
to support functions with multiple arguments and results, and 
constructors taking multiple arguments. This is easy: function 
definitions have the form / khx = K, and continuations have 
the form k x = K and are used for passing multiple results 
and for case branches where the constructor takes multiple ar¬ 
guments. 

• Polymorphic types of the form Va.r can be added. Typing con¬ 
texts are extended with a set of type variables V. Then to sup¬ 
port ML-style let-polymorphism, each value binding construct 
(letval, letfun, and projection) must incorporate polymorphic 


generalization. For example: 

V,a ; r h T : r V; F, a::Va.r; A P K ok 
(l6tV V; T; A b letval x = V in K ok 

For elimination, we simply adapt the variable rule (var) to 
incorporate polymorphic specialization: 



3.4 Effect analysis and transformation 

The use of continuations in an explicit ‘handler-passing style’ lends 
itself very nicely to an effect analysis for exceptions. Suppose, for 
simplicity, that there are a finite number of exception constructors 
ranged over by E. We make the following changes to : 

• We introduce exception set types of the form {E\,... ,E n }, 
representing exception values built with any of the construc¬ 
tors Ei,, E n . Set inclusion induces a subtype ordering on 
exception types, with top type exn representing any exception, 
and bottom type {} representing no exception. 

• The type of handler continuations in function definitions are 
refined to describe the exceptions that the function is permitted 
to throw. For example: 

(1) letfun x = K in ... 

(2) letfun / k (h:-.exn) x = K in ... 

(3) letfun / k (h:^{E, E'}) x = K in ... 







The type of (1) tells us that K never raises an exception, in 
(2) the function can raise any exception, and in (3) the function 
might raise E or E'. 

• Now that handlers are annotated with more precise types, the 
function types must reflect this too. We write r —a for the 
type of functions that either return a result of type a or raise an 
exception of type o' <: exn. Subtyping on function types and 
continuation types is specified by the following rules: 

T2 <: T1 (71 <: 02 <Tj <: (72 <72 <: <7 1 

n—> CTl <7l <: T 2 —► <r - 2 <72 _l ° r l -l <72 

Exception effects enable effect-specific transformations (Benton 
and Buchlovsky 2007). Suppose that the type of / is r — a. 
Then we can apply a ‘dead-handler’ rewrite on the following: 
letcont h:-i{Ei, E2} x = (case x of Ei.ki Q .E2./C2) in f k h y 
—> letcont h:^{Ei} x = (case x of Ei.ki) in f k h y 
In fact, there is nothing exception-specific about this rewrite: it is 
just employing refined types for constructed values. The use of 
continuations has given us exception effects ‘for free’. 

4. Implementing CPS 

Many compilers for functional languages represent intermediate 
language terms in a functional style, as instances of an algebraic 
datatype of syntax trees, and manipulate them functionally. For ex¬ 
ample, the language Acps can be implemented by an SML datatype, 
here using integers for variables, with all bound variables distinct: 

type Var = int and CVar = int 
datatype CVal = 

Unit | Pair of Var * Var I Inj of int * Var 
I Lam of CVar * Var * CTm 
and CTm = 

LetVal of Var * CVal * CTm 
I LetProj of Var * int * Var * CTm 
I LetCont of CVar * Var * CTm * CTm 
I AppCont of CVar * Var 
I App of Var * CVar * Var 
I Case of Var * CVar * CVar 

Rewrites such as those of Figure 5 are then implemented by a 
function that maps terms to terms, applying as many rewrites as 
possible in a single pass. Here is a typical fragment that applies the 
/3 -Pair and Dead-Val reductions: 

fun simp census env S K = 
case K of 

LetVal(x, V, L) => 

if count(census,x) = 0 (* Dead-Val *) 

then simp census env S L 

else LetVal(x, simpVal census env S V, 

simp census (addEnv(env,x,V)) S L) 

I LetProj(x, 1, y, L) => 
let val y’ = applySubst S y 
in case lookup (env, y’) of 
(* Beta-Pair *) 

Pair(z,_) => 

simp census env (extendSubst S (x,z)) L 
LetProj(x, 1, y’, simp census env S L) 

In addition to the term K itself, the simplifier function simp 
takes a parameter env that tracks letval bindings, a parameter S 
used to substitute variables for variables and a parameter census 
that maps each variable to the number of occurrences of the vari¬ 
able, computed prior to applying the function. 


The census becomes out-of-date as reductions are applied, and 
this may cause reductions to be missed until the census is recalcu¬ 
lated and simp applied again. For example, the /3-Pair reduction 
may trigger a Dead-Val in an enclosing letval binding (consider 
letval x = (3/1,3/2) in ... let z = xi x in ... where x occurs only 
once). Maintaining accurate census information as rewrites are per¬ 
formed can increase the number of reductions performed in a single 
pass (Appel and Jim 1997), but even with up-to-date census infor¬ 
mation, it is not possible to perform shrinking reductions exhaus¬ 
tively in a single pass, so a number of iterations may be required be¬ 
fore all redexes have been eliminated. In the worst case, this leads 
to 0(n 2 ) behaviour. 

What’s more, each pass essentially copies the entire term, leav¬ 
ing the original term to be picked up by the garbage collector. This 
can be expensive. (Nonetheless, the simplicity of our CPS lan¬ 
guage, with substitutions only of variables for variables, and the 
lack of commuting conversions as are required in ANF or monadic 
languages, leads to a very straightforward simplifier algorithm.) 

4.1 Graphical representation of terms 

An alternative is to represent the term using a graph, and to perform 
rewrites by destructive update of the graph. Appel and Jim (1997) 
devised a representation for which exhaustive application of the 
shrinking /3-reductions of Figure 5 takes time linear in the size of 
the term. We improve on their representation to support efficient rj- 
reductions and other transformations. The representation has three 
ingredients. 

1. The term structure itself is a doubly-linked tree. Every subterm 
has an up-link to its immediately enclosing term. This supports 
constant time replacement, deletion, and insertion of subterms. 

2. Each bound variable contains a link to one of its free occur¬ 
rences, or is null if the variable is dead, and the free occurrences 
themselves are connected together in a doubly-linked circular 
list. This permits the following operations to be performed in 
constant time: 

• Determining whether a bound variable has zero, one, or 
more than one occurrence, and if it has only one occurrence, 
locating that occurrence. 

• Determining whether a free variable is unique. 

• Merging two occurrence lists. 

Furthermore, we separate recursive and non-recursive uses of 
variables; in essence, instead of letfun /khx = K in L we 
write let / = rec gkhx.K\g/ /] in L. This lets us detect 
Dead-* and /3 -*-Lin reductions. 

3. Free occurrences are partitioned into same-binder equivalence 
classes by using the union-find data structure (Cormen et al. 
2001) 2 . The representative in each equivalence class (that is, the 
root of the union-find tree) is linked to its binding occurrence. 
This supports amortized near-constant time access to the binder 
(the find operation) and merging of occurrence lists (the union 
operation). 

Substitution of variable x for variable y is implemented in near¬ 
constant time by (a) merging the circular lists of occurrences so 
that x now points to the merged list, and (b) applying a union 
operation so that the occurrences of y are now associated with the 
binder for x. 

Consider the following value term, with doubly-linked tree 
structure and union-find structure implicit but with binder-to-free 


2 Readers familiar with type inference may recall that union-find underpins 
the almost-linear time algorithm for term unification (Baader and Nipkow 
1998). 





pointer shown as a dotted arrow and circular occurrence lists shown 
as solid arrows: 



Now suppose that we wish to apply /3-Pair to the projection Trip. 
Using the find operation on the union-find structure we can locate 
the pair (a:, y) in near constant time. Now we substitute x for a by 
disconnecting 2’s binder from its circular list and connecting x’s 
occurrence list in its place, and merging the two lists, in constant 
time. At the same time, we apply the union operation to merge the 
binder equivalence classes (not shown). 

A k se... - 

let p ' 1 ( '*■* , y ) 



Finally we remove the projection itself, deleting the occurrence of p 
from the circular list, again in constant time: 


A k f... . 



One issue remains: the classical union-find data structure does not 
support deletion. There are recent techniques that extend union-find 
with amortized near-constant time deletion (Kaplan et al. 2002). 
However, the representation is non-trivial, and might add unaccept¬ 
able overhead to the union and find operations, so we chose instead 
a simpler solution: do nothing! Deleted occurrences remain in the 
union-find data structure, possibly as root nodes, or as nodes on the 
path to the root. In theory, the efficiency of rewriting is then depen¬ 
dent on the ‘peak’ size of the term, not its current size, but we have 
not found this to be a problem in practice. 

Each of the shrinking reductions of Figure 5 can be imple¬ 
mented in almost-constant time using our graph representation. To 
put these together and apply them exhaustively on a term, we fol¬ 
low Appel and Jim (1997): 

• First sweep over the term, detecting redexes and collecting them 
in a worklist. 

• Then pull items off the worklist one at a time (in any order), 
applying the appropriate rewrite, and adding new redexes to 
the worklist that are triggered by the rewrite. For example, 
the removal of a free occurrence (as can happen for multiple 
variables when applying Dead-Val) can induce a Dead-* 
reduction (if no occurrences remain) or a /3-*-LlN reduction 
(if only a single occurrence remains). 


In the current implementation, the worklist is represented as a 
queue, but it should be possible to thread it through the term itself. 
Shrinking reductions could then be performed with constant space 
overhead. 

4.2 Comparison with Appel/Jim 

The representation of Appel and Jim (1997) did not make use of 
union-find to locate binders. Instead, (a) the circular list of variable 
occurrences included the bound occurrence, thus giving constant 
time access to the binder in the case that the free variable is unique, 
and (b) for letval-bound variables, each free occurrence contained 
an additional pointer to its binder. When performing a substitution 
operation, these binder links must be updated, using time linear in 
the number of occurrences; fortunately, for any particular variable 
this can happen only once during shrinking reductions, as letval- 
bound variables cannot become rebound. Thus the cost is amortized 
across the shrinking reductions. 

Unfortunately the lack of binder occurrences for non-letval- 
bound variables renders less efficient other optimizations such as 
^-reduction. Take an instance of jj-Pair: 
let xi = 7n x in C[let xi = 712 x in C'fletval y = (xi, £2) in AT]] 
—*• let xi = 7Ti x in C[let xi = 712 x in C'[AT[a:/2/]]] 

Just to locate the binder for x.\ and .12 would take time linear in the 
number of occurrences. 

Our use of union-find gives us efficient implementation of all 
shrinking reductions, and of other transformations too; moreover, 
when analysing efficiency we need not be concerned whether vari¬ 
ables are letval-bound or not. 

4.3 Performance results 

We have modified the SML.NET compiler to make use of a typed 
CPS intermediate language only mildly more complex than that 
shown in Figure 7. It employs the graphical representation of terms 
described above; in particular, the simplifier performs shrinking 
reductions exhaustively on a term representing the whole program, 
and it is invoked a total of 15 times during compilation. 

Table 1 presents some preliminary benchmark results show¬ 
ing average time spent in simplification, time spent in monomor- 
phisation, and time spent in unit-removal (e.g. transformation of 
unit*int values to int). We compare (a) the released version of 
SML.NET, implementing a monadic intermediate language (MIL) 
and functional-style simplification algorithm, (b) the Appel/Jim- 
style graph representation adapted to MIL terms implemented by 
Lindley (Benton et al. 2004a; Lindley 2005), and (c) the new graph- 
based CPS representation with union-find. Tests were run on a 
3Ghz Pentium 4 PC with 1GB of RAM running Windows Vista. 
The SML.NET compiler is implemented in Standard ML and com¬ 
piled using the MLton optimizing compiler, which generates high 
quality code from both functional and imperative coding styles - so 
giving both techniques a fair shot. 

As can be seen from the figures, the graph-based simplifier for 
the monadic language is significantly faster than the functional sim¬ 
plifier - and although all times are small, bear in mind that the 
simplifier is run many times during compilation. Unit removal is 
roughly comparable in performance across implementations. Inter¬ 
estingly, the graph-based CPS implementation of monomorphisa- 
tion runs up to twice as slowly as the functional monadic imple¬ 
mentation. We conjecture that this is because monomorphisation 
necessarily copies (and specializes) terms, and CPS terms tend to 
be larger than MIL terms, and the graph representation is larger 
still. 

These figures come with a caveat: the comparison is somewhat 
“apples and oranges”. There are differences between the MIL, g- 
MIL and g-CPS representations that are unrelated to monads or 




Table 1. Optimization times (in seconds) 


Benchmark 

Lines 

Phase 

MIL 

g-MIL 

g-CPS 

raytrace 

2,500 

Simp 

0.12 

0.01 

0.01 

mlyacc 

6,200 

Simp 

0.44 

0.02 

0.02 

smlnet 

80,000 

Simp 

7.29 

0.29 

0.15 



Mono 

0.75 

n/a 

1.41 



Deunit 

0.76 

1.3 

0.6 

hamlet 

20,000 

Simp 

0.97 

0.08 

0.04 



Mono 

0.15 

n/a 

0.19 



Deunit 

0.12 

0.16 

0.14 


CPS. Future work is to make a fairer comparison, implementing 
a functional version of the CPS terms, and perhaps also a precise 
monadic analogue. 

5. Contification 

Our CPS languages make a syntactic distinction between functions 
and local continuations. The former are typically compiled as heap- 
allocated closures or as known functions, whilst the latter can al¬ 
ways be compiled as inline code with continuation applications 
compiled as jumps. For efficiency it is therefore desirable to trans¬ 
form functions into continuations, a process that has been termed 
contification (Fluet and Weeks 2001). 

Functions can be contified when they always return to the same 
place. Consider the following code written in the subset of SML 
studied in Section 2: 

let fun f x = ... 

in g (case d of ini dl => f y I in2 d2 => f d2) end 

If f returns at all, it must pass control to g. Here, this is obvious, 
but for more complex examples it is not so apparent. Now consider 
its CPS transform: 

letval / = (A kx.- ■ ■ k ■ ■ •) in 
letcont fco w = g r w in 
letcont j i di = / fco y in 
letcont d,2 = f ko d,2 in 
case d of ji Q 32 

It is clear that / is always passed the same continuation fco - and 
so, unless it diverges, it must return through fco and so pass control 
to g. We can transform / into a local continuation, as follows: 
letcont fco w = g r w in 
letcont j x = ■ ■ • fco • • • in 
letcont j 1 di = j y in 
letcont j2 = j in 
case d of ji [] j 2 

We have done three things: (a) we have replaced the function / by 
a continuation j, deleting the return continuation at both definition 
and call sites, (b) we have substituted the argument fco for the 
formal fc in the body of /, and (c) we have moved j so that it is 
in the scope of fco. 

Fluet and Weeks (2001) use the dominator tree of a program’s 
call graph to contify programs that consist of a collection of 
mutually-recursive first-order functions. They show that their al¬ 
gorithm is optimal, no contifiable functions remain after applying 
the transformation. Their dominator-based analysis can be adapted 
to our CPS languages, and is simpler to describe in this context be¬ 
cause all function definitions and uses have a named continuation 
(Fluet and Weeks use named continuations only for non-tail calls). 
When applied to top-level functions, the transformation is simpler 
too, but in the presence of first-class functions and general block 
structure the transformation becomes significantly more complex 
to describe. 


We prefer an approach based on incremental transformation, in 
essence repeatedly applying the rewrite illustrated above until no 
further rewrites are possible. We consider first the case of non¬ 
recursive functions, then generalize to mutually-recursive func¬ 
tions, and conclude by relating our technique to dominator-based 
contification. 

5.1 Non-recursive functions 

In the untyped language Acps without recursion, it is particularly 
straightforward to spot contifiable functions: they are those for 
which all occurrences are applications with the same continuation 
argument. We define the following rewrite: 

CONT (/ not free in C, V and V minimal): 

letval / = A kx.K in C[D[f k 0 x u ..., f fc 0 *„]] 

C [letcont j x = K[k 0 /k] in T>\j x n ]\ 

Here C is a single-hole context as presented in Figure 5 and D is a 
multi-hole context whose formalization we omit. 

The Cont rewrite combines three actions: (a) the function / 
is replaced by a continuation j, with each application replaced 
by a continuation application; (b) the common continuation fco is 
substituted for the formal continuation parameter fc in the body K 
of /; and (c) the new continuation j is pulled into the scope 
of the continuation fco. The multi-hole context V is the smallest 
context enclosing all uses of /, which ensures that j is in scope 
after transformation. The analysis is trivial (just check call sites for 
common continuation arguments), yet iterating this transformation 
leads to optimal contification, in the sense of Fluet and Weeks 
(2001). Here is an example adapted from loc. cit. §5.2, 
letval h = \kh xh■■ ■ ■ in 
letval <71 = Afci xi.- ■ ■ h fci zi ■ ■ ■ fci z% ■ ■ • in 

letval g2 = Xk2 X2■■ ■ ■ h k2 22 • • • in 

letval / = A kf Xf.- ■ ■ gi kf z 3 ■ ■ ■ 92 kf 24 • • • 92 kf z s • • • in 

letval m = Xkm x m •• • • / ji z e • • • / 32 27 in ... 

We can immediately see that gi and 92 (but not h) are always 
passed the same continuation kf, and so we can apply Cont to 
contify them both: 

letval h = Xkh xh-- ■ • in 
letval / = A kf Xf. 

(letcont kgi xi = ■ ■ ■ h kf zi ■ ■ ■ kf zs ■ ■ ■ in 
letcont kg2 X2 = ■ ■ ■ h kf Z2 ■ ■ ■ in 

• • • fcfli «3 • • • fcfl2 24 • • • kg 2 25 • • •) in 

letval Xm k m .x m ■ ■ ■ f ji ■ ■ ■ f J2 27 = in ... 

Now h can be contified as it is always passed fc/: 
letval / = A kf Xf. 

(letcont kh Xh = ■ ■ ■ in 

letcont kgi xi = ■ ■ ■ kh zi ■ ■ ■ kf z% in 

letcont fc<?2 X2 — ■ ■ ■ kh Z2 ■ ■ ■ in 

• • • fcfli 23 • • • fcfl2 24 • • • kg 2 25 • • •) in 

letval Am k m -Xm-■ ■ f ji Z6 ■ • • / 32 27 = in... 

5.2 Recursive functions 

Generalizing to recursive functions and continuations is a little 
trickier. Suppose we have a Acps term of the form 

letfun /1 fci hi xi = Ki 
f n k n h n x n = K n 

in K. 

A set of functions F C {/1..... /„,} can be contified collectively, 
written Contifiable(F), if there is some pair of continuations fco 
and ho such that each occurrence of / € F is either a tail call 





within F or is a call with continuation arguments ko and ho. In¬ 
tuitively, each function (eventually) returns to the same place (fco), 
or throws an exception that is caught by the same handler (ho), 
though control may pass tail-recursively through other functions 
in F. There may be many such subsets F ; we assume that F is in 
fact strongly-connected with respect to tail calls contained within it 
(or is a trivial singleton with no tail calls). Then for a given letfun 
term there is a unique partial partition of the functions into disjoint 
subsets satisfying Contifiable(—). 

Let F = {/i ,..., fm}. Define a translation on function appli¬ 
cations 

ifkhxf=\ j :i lf I =fieF 

\fkhx otherwise 

and extend this to all terms. Assuming that Contifiable(F) holds, 
there are two possibilities. 

1. All applications of the form / fco ho x for f (z F are in the 
term K. Then we can apply the following rewrite, which is the 
direct analogue of CONT. 

RecCont (/i ,..., fm not free in C, and K minimal): 

letfun /i fci hi xi = K | 

fnknh n Xn=K n 

in C[K] 

letfun frn+l km+1 hm+l x m +l = Km+1 

fnk n h n X n = K n 

in C[letconty'i xi = Kf[ko/ki,ho/hi\ 

••• jmX m =K* rn [ko/km,h 0 /hm\ 
in K*] 

2. Otherwise, all applications of the form / fco ho x for / € F 
are in the body of one of the functions outside of F; without 
loss of generality we assume this is /„. 

RECCONT2 not free in C, and K n minimal): 

letfun /i ki hi xi = K\ 

f n —l k n -1 K -1 Xn-i = K n -1 
fn k n h n X„ = C[K n \ 

in K 

letfun f m+ 1 km+l h m +l Xm+1 = K m+ 1 
■■■ fn-lkn-lhn-lXn-l=K n -l 

fnknh n X n = 

C[letcont ji xi = Kl[k 0 /ki,h 0 /hi] 

... j m x m =Kf n [ko/km,h 0 /h m ] 



For an example of the latter, more complex, transformation, 
consider the following SML code: 

let fun unif(Ap(a,xs),Ap(b,ys)) = (unif(a,b);unifV(xs,ys)) 
I unif(Ar(a,b),Ar(c,d)) = unifVC[a,b],[c,d]) 
and unifV(x::xs,y::ys) = (unif(x,y);unifV(xs,ys)) 

I unifV (□,[]) = O 
in unif end 

The function unif yV can be contified into the definition of unif: it 
tail-calls itself, and its uses inside unif have the same continuation. 

5.3 Comparing dominator-based confiscation 

The dominator-based approach of Fluet and Weeks (2001) can be 
recast in our CPS language as follows. (For simplicity we do not 
consider exception handler continuations here). First construct a 
continuation flow graph for the whole program. Nodes consist of 
continuation variables and a distinguished root node. Then for each 


function / with return continuation k, if / is passed around as a 
first-class value then create an edge from root to k; otherwise, for 
each application f j x create an edge from j to k. Finally, for each 
local continuation k create an edge from root to k. 

The non-recursive Cont rewrite has the effect of merging two 
nodes in the graph, as follows: 


The recursive RecCont and RecCont 2 rewrites are similar, 
except that in place of k we have a strongly-connected component 



Conversely, any part of the flow graph matching the left-hand-side 
of this diagram corresponds to a contifiable subset of functions in a 
letfun to which the RecCont or RecCont 2 rules can be applied. 

ft is immediately clear that exhaustive rewriting terminates, 
as the flow graph decreases in size with each rewrite, eventually 
producing a graph with no occurrences of the pattern above. 

The algorithm described by Fluet and Weeks (2001) contifies k 
if it is strictly dominated by some continuation j whose immediate 
dominator is root. It can be shown that if a rooted graph contains 
such a pair of nodes j and k, then some part of the graph matches 
the pattern above. Hence exhaustive rewriting has the same effect 
as as optimal contification based on dominator trees. 

6. Related work and conclusion 

The use of continuation-passing style for functional languages has 
its origins in Scheme compilers (Steele 1978; Kranz et al. 1986). 
It later formed the basis of the Standard ML of New lersey com¬ 
piler (Appel 1992; Shao and Appel 1995). 

In early compilers, lambdas originating from the CPS transfor¬ 
mation were not distinguished from lambdas present in the source, 
so some effort was expended at code generation time to determine 
which lambdas could be stack-allocated and which could be heap- 
allocated. Later compilers made a syntactic distinction between 
true functions and ‘second-class’ continuations introduced by CPS; 
and sometimes transformed one into the other (Kelsey and Hudak 
1989), though contification was not studied formally. 

A number of more recent compilers use what has been called 
almost CPS. The Sequentialized Intermediate Language (SIL) em¬ 
ployed by Tolmach and Oliva (1998) is a monadic-style language in 
which a letcont-like feature is used to introduce join points. Some¬ 
what closer to our CPS language is the First Order Language (FOL) 
of the MLton compiler (Fluet and Weeks 2001). It goes further than 
SIL in making use of named local continuations in all branch con¬ 
structs and non-tail calls. However, functions are not parameterized 
on return (or handler) continuations, and there is special syntax for 
tail calls and returns. This non-uniform treatment of continuations 
complicates transformations - inlining of non-tail functions must 
replace all ‘return points’ with jumps, and the contification analy¬ 
sis and transformation must treat tail and non-tail calls differently. 

We have found the uniform treatment of continuations in our 
CPS language to be a real benefit, not only as a simplifying force in 
implementation, but also in thinking about compiler optimizations: 






contification, in particular, is difficult to characterize in the absence 
of a notion of continuation passing. 

As far as we are aware, we are the first to implement linear¬ 
time shrinking reductions in the style of Appel and Jim (1997). An 
earlier term-graph implementation by Lindley was for a monadic 
I g s.e nl had worst-case 0(n 2 ) behaviour due to commuting 
conversions (Benton et al. 2004a; Lindley 2005). Shivers and Wand 
(2005) have proposed a rather different graph representation for 
lambda terms, with the goal of sharing subterms after /3-reduction. 
The ep e e t t tn does bear some resemblance to ours, though, 
with up-links from subterms to enclosing terms, and circular lists 
that connect the sites where a term is substituted for a variable. 

This paper would not be complete without a mention of Static 
Single Assignment form (SSA), the currently fashionable interme¬ 
diate representation for imperative languages. As is well known, 
SSA is in some sense equivalent to CPS (Kelsey 1995) and to 
ANF (Appel 1998). Its focus is intra-procedural optimization (as 
with ANF, it’s necessary to renormalize when Mining functions, 
in contrast to CPS) and there is a large body of work on such op- 
tiMzations. Future work is to transfer SSA-based optimizations to 
CPS. We conjecture that CPS is a good fit for both functional and 
imperative paradigms. 
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